BRAT: A Random Walk Through the Semantic Spaces of the Blogosphere

نویسندگان

  • Adil El Ghali
  • Yann Vigile Hoareau
چکیده

Semantic spaces, such as the Latent Semantic Analysis (LSA), Hyperspace Analog to Language (HAL) or Random Indexing (RI), offer convenient methods to represent semantic relations between words and concepts, abstracted from a distribution of documents. The distribution of documents determines the local co-occurrence pattern between words all over the corpus and, then, determines the semantic abstracted from the local distribution. Such methods are sensitive to the statistical properties on the distribution of words over documents. For instance, the semantic on the word table abstracted from a scientific corpus or a general corpus may be different. In the first case, since table may occur in the context of table of correlation or table of results, it would be considered to be associated to the word correlation whereas in the second case, because it may co-occur with kitchen or living-room, it would rather be considered as similar to chair. Nevertheless, the formal relation bearing the properties of the distribution of word’s co-occurence and the final semantic produced by Semantic space methods have not been described until now. In the case of a mixed “scientific and general” corpus, what makes that the semantic of table became more similar to chair than Speerman and vice-versa? We approached the Top-stories task of the Blog-Track’09 using a system named Blogosphere Random Analysis using Texts (BRAT) composed of two layers. The first layer distributes and represents blogs posts’ in different semantic spaces built using Random Indexing. The second layer is an algorithm of retrieval that have the aim of navigate in the semantic space via a ramdom walk. BRAT have been constructed under two main working hypothesis that we considered important for dealing with the semantic of the blogosphere: the notion

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Random Walk Approach to Modeling the Dynamics of the Blogosphere

It is important to develop intuitive and tractable generative models to simulate the topological and temporal dynamics of the blogosphere because these models provide insights about its structural evolution. In such generative models, independent instances of individual bloggers are initiated and these instances interact with each other to simulate the evolution of the blogosphere. Existing gen...

متن کامل

A PRELUDE TO THE THEORY OF RANDOM WALKS IN RANDOM ENVIRONMENTS

A random walk on a lattice is one of the most fundamental models in probability theory. When the random walk is inhomogenous and its inhomogeniety comes from an ergodic stationary process, the walk is called a random walk in a random environment (RWRE). The basic questions such as the law of large numbers (LLN), the central limit theorem (CLT), and the large deviation principle (LDP) are ...

متن کامل

Semantic Factors: Students’ Sense of Belonging to Outdoor School Spaces

School is an environment which brings out students’ hidden talents. Paying attention to an appropriate context and environment has a huge impact on achieving this goal. The purpose of this study was to determine and evaluate semantic factors provided by experts influence students’ sense of belonging at high school students in terms of Iranian experts. To this end, firstly data were collected th...

متن کامل

A Random Walk with Exponential Travel Times

Consider the random walk among N places with N(N - 1)/2 transports. We attach an exponential random variable Xij to each transport between places Pi and Pj and take these random variables mutually independent. If transports are possible or impossible independently with probability p and 1-p, respectively, then we give a lower bound for the distribution function of the smallest path at point log...

متن کامل

Central Limit Theorem in Multitype Branching Random Walk

A discrete time multitype (p-type) branching random walk on the real line R is considered. The positions of the j-type individuals in the n-th generation form a point process. The asymptotic behavior of these point processes, when the generation size tends to infinity, is studied. The central limit theorem is proved.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009